CorePhrase: Keyphrase Extraction for Document Clustering

نویسندگان

  • Khaled M. Hammouda
  • Diego N. Matute
  • Mohamed S. Kamel
چکیده

The ability to discover the topic of a large set of text documents using relevant keyphrases is usually regarded as a very tedious task if done by hand. Automatic keyphrase extraction from multi-document data sets or text clusters provides a very compact summary of the contents of the clusters, which often helps in locating information easily. We introduce an algorithm for topic discovery using keyphrase extraction from multi-document sets and clusters based on frequent and significant shared phrases between documents. The keyphrases extracted by the algorithm are highly accurate and fit the cluster topic. The algorithm is independent of the domain of the documents. Subjective as well as quantitative evaluation show that the algorithm outperforms keyword-based cluster-labeling algorithms, and is capable of accurately discovering the topic, and often ranking it in the top one or two extracted keyphrases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CollabRank: Towards a Collaborative Approach to Single-Document Keyphrase Extraction

Previous methods usually conduct the keyphrase extraction task for single documents separately without interactions for each document, under the assumption that the documents are considered independent of each other. This paper proposes a novel approach named CollabRank to collaborative single-document keyphrase extraction by making use of mutual influences of multiple documents within a cluste...

متن کامل

Clustering to Find Exemplar Terms for Keyphrase Extraction

Keyphrases are widely used as a brief summary of documents. Since manual assignment is time-consuming, various unsupervised ranking methods based on importance scores are proposed for keyphrase extraction. In practice, the keyphrases of a document should not only be statistically important in the document, but also have a good coverage of the document. Based on this observation, we propose an u...

متن کامل

Domain-Specific Keyphrase Extraction

Keyphrases are an important means of document summarization, clustering, and topic search. Only a small minority of documents have author-assigned keyphrases, and manually assigning keyphrases to existing documents is very laborious. Therefore it is highly desirable to automate the keyphrase extraction process. This paper shows that a simple procedure for keyphrase extraction based on the naive...

متن کامل

Domain-speciic Keyphrase Extraction

Keyphrases are an important means of document summarization, clustering, and topic search. Only a small minority of documents have author-assigned keyphrases, and manually assigning keyphrases to existing documents is very laborious. Therefore it is highly desirable to automate the keyphrase extraction process. This paper shows that a simple procedure for keyphrase extraction based on the naive...

متن کامل

Domain - Speci c Keyphrase Extraction

Keyphrases are an important means of document summarization, clustering, and topic search. Only a small minority of documents have author-assigned keyphrases, and manually assigning keyphrases to existing documents is very laborious. Therefore it is highly desirable to automate the keyphrase extraction process. This paper shows that a simple procedure for keyphrase extraction based on the naive...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005